Goto

Collaborating Authors

 multiple language


BOOM: Beyond Only One Modality KIT's Multimodal Multilingual Lecture Companion

Koneru, Sai, Retkowski, Fabian, Huber, Christian, Hilgert, Lukas, Akti, Seymanur, Ugan, Enes Yavuz, Waibel, Alexander, Niehues, Jan

arXiv.org Artificial Intelligence

The globalization of education and rapid growth of online learning have made localizing educational content a critical challenge. Lecture materials are inherently multimodal, combining spoken audio with visual slides, which requires systems capable of processing multiple input modalities. To provide an accessible and complete learning experience, translations must preserve all modalities: text for reading, slides for visual understanding, and speech for auditory learning. We present \textbf{BOOM}, a multimodal multilingual lecture companion that jointly translates lecture audio and slides to produce synchronized outputs across three modalities: translated text, localized slides with preserved visual elements, and synthesized speech. This end-to-end approach enables students to access lectures in their native language while aiming to preserve the original content in its entirety. Our experiments demonstrate that slide-aware transcripts also yield cascading benefits for downstream tasks such as summarization and question answering. We release our Slide Translation code at https://github.com/saikoneru/image-translator and integrate it in Lecture Translator at https://gitlab.kit.edu/kit/isl-ai4lt/lt-middleware/ltpipeline}\footnote{All released code and models are licensed under the MIT License.


TIFIN India at SemEval-2025: Harnessing Translation to Overcome Multilingual IR Challenges in Fact-Checked Claim Retrieval

Devadiga, Prasanna, Suneesh, Arya, Rajpoot, Pawan Kumar, Hazarika, Bharatdeep, Baliga, Aditya U

arXiv.org Artificial Intelligence

We address the challenge of retrieving previously fact-checked claims in monolingual and crosslingual settings - a critical task given the global prevalence of disinformation. Our approach follows a two-stage strategy: a reliable baseline retrieval system using a fine-tuned embedding model and an LLM-based reranker. Our key contribution is demonstrating how LLM-based translation can overcome the hurdles of multilingual information retrieval. Additionally, we focus on ensuring that the bulk of the pipeline can be replicated on a consumer GPU. Our final integrated system achieved a success@10 score of 0.938 and 0.81025 on the monolingual and crosslingual test sets, respectively.


Google CEO, major tech leaders join first lady Melania Trump at White House AI meeting

FOX News

First lady Melania Trump is hosting an artificial intelligence meeting with top industry leaders, including Google CEO Sundar Pichai, Thursday, as she stresses the importance of managing AI's growth "responsibly." The White House Task Force on Artificial Intelligence Education will meet for the second time in the East Room of the White House Thursday afternoon. The first lady will host the meeting alongside members of the task force and private sector leaders. "I predict AI will represent the single largest growth category in our nation during the Trump Administration -- and I won't be surprised if AI becomes known as the greatest engine of progress in the history of the United States of America," the first lady said. First lady Melania Trump is hosting an artificial intelligence meeting with top industry leaders, as she stresses the importance of managing AI's growth "responsibly."


DIY-MKG: An LLM-Based Polyglot Language Learning System

Tang, Kenan, Li, Yanhong, Qin, Yao

arXiv.org Artificial Intelligence

Existing language learning tools, even those powered by Large Language Models (LLMs), often lack support for polyglot learners to build linguistic connections across vocabularies in multiple languages, provide limited customization for individual learning paces or needs, and suffer from detrimental cognitive offloading. To address these limitations, we design Do-It-Yourself Multilingual Knowledge Graph (DIY-MKG), an open-source system that supports polyglot language learning. DIY-MKG allows the user to build personalized vocabulary knowledge graphs, which are constructed by selective expansion with related words suggested by an LLM. The system further enhances learning through rich annotation capabilities and an adaptive review module that leverages LLMs for dynamic, personalized quiz generation. In addition, DIY-MKG allows users to flag incorrect quiz questions, simultaneously increasing user engagement and providing a feedback loop for prompt refinement. Our evaluation of LLM-based components in DIY-MKG shows that vocabulary expansion is reliable and fair across multiple languages, and that the generated quizzes are highly accurate, validating the robustness of DIY-MKG.


Ask a Local: Detecting Hallucinations With Specialized Model Divergence

Creo, Aldan, Cerezo-Costas, Héctor, Alonso-Doval, Pedro, Hormazábal-Lagos, Maximiliano

arXiv.org Artificial Intelligence

Hallucinations in large language models (LLMs) - instances where models generate plausible but factually incorrect information - present a significant challenge for AI. We introduce "Ask a Local", a novel hallucination detection method exploiting the intuition that specialized models exhibit greater surprise when encountering domain-specific inaccuracies. Our approach computes divergence between perplexity distributions of language-specialized models to identify potentially hallucinated spans. Our method is particularly well-suited for a multilingual context, as it naturally scales to multiple languages without the need for adaptation, relying on external data sources, or performing training. Moreover, we select computationally efficient models, providing a scalable solution that can be applied to a wide range of languages and domains. Our results on a human-annotated question-answer dataset spanning 14 languages demonstrate consistent performance across languages, with Intersection-over-Union (IoU) scores around 0.3 and comparable Spearman correlation values. Our model shows particularly strong performance on Italian and Catalan, with IoU scores of 0.42 and 0.38, respectively, while maintaining cross-lingual effectiveness without language-specific adaptations. We release our code and architecture to facilitate further research in multilingual hallucination detection.


MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection

Hikal, Baraa, Nasreldin, Ahmed, Hamdi, Ali

arXiv.org Artificial Intelligence

This paper describes our submission for SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. The task involves detecting hallucinated spans in text generated by instruction-tuned Large Language Models (LLMs) across multiple languages. Our approach combines task-specific prompt engineering with an LLM ensemble verification mechanism, where a primary model extracts hallucination spans and three independent LLMs adjudicate their validity through probability-based voting. This framework simulates the human annotation workflow used in the shared task validation and test data. Additionally, fuzzy matching refines span alignment. Our system ranked 1st in Arabic and Basque, 2nd in German, Swedish, and Finnish, and 3rd in Czech, Farsi, and French.


AI Melania: First lady embarks on 'new frontier' in publishing with audiobook of memoir

FOX News

EXCLUSIVE: First lady Melania Trump is launching an audiobook of her memoir using artificial intelligence (AI) audio technology in multiple languages, Fox News Digital has learned. The first lady released her first memoir, "Melania," last year. This week, she is breaking new ground by releasing "Melania, the Audiobook," which has been "created entirely" with AI. "I am proud to be at the forefront of publishing's new frontier – the intersection of artificial intelligence technology and audio," Trump told Fox News Digital. The first lady said ElevenLabs AI developed "an AI-generated replica of my voice under strict supervision, which will establish an unforgettable connection with my personal story, in multiple languages for listeners worldwide." ElevenLabs AI CEO Mati Staniszewski told Fox News Digital that they are "excited that Melania Trump trusted our technology to power this first-of-its-kind audiobook project."

  Country: North America > United States (0.83)
  Industry:

The Loneliness Epidemic Is a Security Crisis

WIRED

Loneliness has never been more urgent. On top of the significant mental health concerns, the idea that people are now lonelier and having fewer social interactions is fueling very real threats to security. Foremost among these is one of today's most pernicious digital frauds: romance scams, which exploit targets' feelings of isolation and net fraudsters hundreds of millions of dollars per year. As scammers increasingly organize their workflows and incorporate new AI technologies, it's becoming possible for them to deploy these scams at an even more vast scale. Romance scams, also known as confidence scams, are extremely communication-intensive. They require attackers to build relationships with their targets via dating apps and social media.


Comparative Approaches to Sentiment Analysis Using Datasets in Major European and Arabic Languages

Krasitskii, Mikhail, Kolesnikova, Olga, Hernandez, Liliana Chanona, Sidorov, Grigori, Gelbukh, Alexander

arXiv.org Artificial Intelligence

This study explores transformer-based models such as BERT, mBERT, and XLM-R for multilingual sentiment analysis across diverse linguistic structures. Key contributions include the identification of XLM-R's superior adaptability in morphologically complex languages, achieving accuracy levels above 88%. The work highlights fine-tuning strategies and emphasizes their significance for improving sentiment classification in underrepresented languages.


Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models

Mu, Yongyu, Li, Hengyu, Wang, Junxin, Zhou, Xiaoxuan, Wang, Chenglong, Luo, Yingfeng, He, Qiaozhi, Xiao, Tong, Chen, Guocheng, Zhu, Jingbo

arXiv.org Artificial Intelligence

Previous work on augmenting large multimodal models (LMMs) for text-to-image (T2I) generation has focused on enriching the input space of in-context learning (ICL). This includes providing a few demonstrations and optimizing image descriptions to be more detailed and logical. However, as demand for more complex and flexible image descriptions grows, enhancing comprehension of input text within the ICL paradigm remains a critical yet underexplored area. In this work, we extend this line of research by constructing parallel multilingual prompts aimed at harnessing the multilingual capabilities of LMMs. More specifically, we translate the input text into several languages and provide the models with both the original text and the translations. Experiments on two LMMs across 3 benchmarks show that our method, PMT2I, achieves superior performance in general, compositional, and fine-grained assessments, especially in human preference alignment. Additionally, with its advantage of generating more diverse images, PMT2I significantly outperforms baseline prompts when incorporated with reranking methods. Our code and parallel multilingual data can be found at https://github.com/takagi97/PMT2I.